\chapter{模块}

\section{random}

\subsection{random}

random模块提供了生成随机数据的功能。

\begin{table}[H]
    \centering
    \setlength{\tabcolsep}{5mm}{
        \begin{tabular}{|l|l|}
            \hline
            \textbf{方法} & \textbf{功能}                      \\
            \hline
            random()      & 随机生成一个$ [0, 1] $之间的浮点数 \\
            \hline
            randint(x, y) & 随机生成一个$ [x, y] $之间的整数   \\
            \hline
            choice()      & 从序列中随机返回一个元素           \\
            \hline
            shuffle()     & 将序列打乱                         \\
            \hline
            sample()      & 从序列中生成一组唯一的随机元素     \\
            \hline
        \end{tabular}
    }
    \caption{random模块}
\end{table}

\mybox{随机密码生成}

\begin{lstlisting}[language=Python]
import random
import string

def password_generator(length):
    characters = string.ascii_letters + string.digits
    return "".join(random.choice(characters) for _ in range(length))

def main():
    length = int(input("Enter length of password: "))
    print(password_generator(length))

if __name__ == "__main__":
    main()
\end{lstlisting}

\begin{tcolorbox}
    \mybox{运行结果}
    \begin{verbatim}
Enter length of password: 8
@o8VBuiV
\end{verbatim}
\end{tcolorbox}

\newpage

\section{copy}

\subsection{引用（Reference）}

引用是指同一块内存空间被交给不同的对象进行操作，当一个对象修改了数据后，另一个对象的数据也会发生改变。

\vspace{-0.5cm}

\begin{lstlisting}[language=Python]
a = {1: [1, 2, 3]}
b = a
\end{lstlisting}

\begin{figure}[H]
    \centering
    \begin{tikzpicture}
        \draw (0,0) rectangle node{\{1: \textcolor{blue}{[1, 2, 3]}\}} (2.5,1);
        \draw (-2,1.5) circle(0.5) node{a};
        \draw (-2,-0.5) circle(0.5) node{b};
        \draw (3,-2) rectangle node{\textcolor{blue}{[1, 2, 3]}} (5,-1);

        \draw[->] (-1.5,1.5) -- (0,1);
        \draw[->] (-1.5,-0.5) -- (0,0);
        \draw[->, blue] (1.4,0.2) -- (1.4,-1.5) -- (3,-1.5);
    \end{tikzpicture}
    \caption{引用}
\end{figure}

\vspace{0.5cm}

\mybox{引用}

\begin{lstlisting}[language=Python]
info = dict(name="Alice", skills=["Python", "C"])
info_ref = info
info_ref["skills"].append("Java")
print(info)
\end{lstlisting}

\begin{tcolorbox}
    \mybox{运行结果}
    \begin{verbatim}
{'name': 'Alice', 'skills': ['Python', 'C', 'Java']}
\end{verbatim}
\end{tcolorbox}

\vspace{0.5cm}

\subsection{copy}

copy模块提供了拷贝对象的功能。

copy是一个专门进行内容拷贝的模块。拷贝分为浅拷贝（shallow copy）和深拷贝（deep copy）：

\begin{enumerate}
    \item 浅拷贝：只拷贝父对象，不会拷贝对象的内部的子对象。
    \item 深拷贝：完全拷贝父对象及其子对象。
\end{enumerate}

\vspace{-0.5cm}

\begin{lstlisting}[language=Python]
a = {1: [1, 2, 3]}
b = a.copy()
\end{lstlisting}

\begin{figure}[H]
    \centering
    \begin{tikzpicture}
        \draw (0,1) rectangle node{\{1: \textcolor{blue}{[1, 2, 3]}\}} (2.5,2);
        \draw (-2,1.5) circle(0.5) node{a};
        \draw (0,-1) rectangle node{\{1: \textcolor{red}{[1, 2, 3]}\}} (2.5,0);
        \draw (-2,-0.5) circle(0.5) node{b};
        \draw (4,0) rectangle node{[1, 2, 3]} (6,1);

        \draw[->] (-1.5,1.5) -- (0,1.5);
        \draw[->] (-1.5,-0.5) -- (0,-0.5);
        \draw[->, blue] (1.4,1.2) -- (1.4,0.75) -- (4,0.75);
        \draw[->, red] (1.4,-0.2) -- (1.4,0.3) -- (4,0.3);
    \end{tikzpicture}
    \caption{浅拷贝}
\end{figure}

\vspace{0.5cm}

\mybox{浅拷贝}

\begin{lstlisting}[language=Python]
import copy

info = dict(name="Alice", skills=["Python", "C"])
info_copy = copy.copy(info)
info.pop("name")
info_copy["skills"].append("Java")

print(info)
print(info_copy)
\end{lstlisting}

\newpage

\begin{tcolorbox}
    \mybox{运行结果}
    \begin{verbatim}
{'skills': ['Python', 'C', 'Java']}
{'name': 'Alice', 'skills': ['Python', 'C', 'Java']}
\end{verbatim}
\end{tcolorbox}

\begin{lstlisting}[language=Python]
a = {1: [1, 2, 3]}
b = copy.deepcopy(a)
\end{lstlisting}

\begin{figure}[H]
    \centering
    \begin{tikzpicture}
        \draw (0,1) rectangle node{\{1: \textcolor{blue}{[1, 2, 3]}\}} (2.5,2);
        \draw (-2,1.5) circle(0.5) node{a};
        \draw (0,-1) rectangle node{\{1: \textcolor{red}{[1, 2, 3]}\}} (2.5,0);
        \draw (-2,-0.5) circle(0.5) node{b};
        \draw (4,0) rectangle node{\textcolor{blue}{[1, 2, 3]}} (6,1);
        \draw (4,-2) rectangle node{\textcolor{red}{[1, 2, 3]}} (6,-1);

        \draw[->] (-1.5,1.5) -- (0,1.5);
        \draw[->] (-1.5,-0.5) -- (0,-0.5);
        \draw[->, blue] (1.4,1.2) -- (1.4,0.5) -- (4,0.5);
        \draw[->, red] (1.4,-0.8) -- (1.4,-1.5) -- (4,-1.5);
    \end{tikzpicture}
    \caption{深拷贝}
\end{figure}

\vspace{0.5cm}

\mybox{深拷贝}

\begin{lstlisting}[language=Python]
import copy

info = dict(name="Alice", skills=["Python", "C"])
info_copy = copy.deepcopy(info)
info.pop("name")
info_copy["skills"].append("Java")

print(info)
print(info_copy)
\end{lstlisting}

\begin{tcolorbox}
    \mybox{运行结果}
    \begin{verbatim}
{'skills': ['Python', 'C']}
{'name': 'Alice', 'skills': ['Python', 'C', 'Java']}
\end{verbatim}
\end{tcolorbox}

\newpage

\section{MapReduce}

\subsection{MapReduce}

MapReduce在数据处理中可以用于大规模的数据过滤、分析、统计等操作。\\

\begin{table}[H]
    \centering
    \setlength{\tabcolsep}{5mm}{
        \begin{tabular}{|l|l|}
            \hline
            \textbf{函数} & \textbf{功能}      \\
            \hline
            filter()      & 过滤序列           \\
            \hline
            map()         & 对序列数据进行处理 \\
            \hline
            reduce()      & 对序列数据进行统计 \\
            \hline
        \end{tabular}
    }
    \caption{MapReduce数据处理函数}
\end{table}

在数据处理的过程中需要一个处理函数，用于指定数据如何进行处理或统计，一般而言这样的函数都比较短，所以大部分情况下都可以利用lambda函数来完成。\\

\mybox{奇数平方和}

\begin{lstlisting}[language=Python]
from functools import reduce

lst = list(range(10))
print("lst =", lst)

odds = list(filter(lambda x: x % 2 == 1, lst))
print("odds =", odds)

squares = list(map(lambda x: x ** 2, odds))
print("squares =", squares)

sum = reduce(lambda x, y: x + y, squares)
print("sum =", sum)
\end{lstlisting}

\begin{tcolorbox}
    \mybox{运行结果}
    \begin{verbatim}
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
odds = [1, 3, 5, 7, 9]
squares = [1, 9, 25, 49, 81]
sum = 165
\end{verbatim}
\end{tcolorbox}

\newpage

\section{jieba}

\subsection{pip}

除了Python内置的模块，开发者还可以安装由社区开发并维护的第三方模块。pip管理工具可以用于安装、卸载、更新第三方模块。\\

\begin{table}[H]
    \centering
    \setlength{\tabcolsep}{5mm}{
        \begin{tabular}{|l|l|}
            \hline
            \textbf{功能}  & \textbf{命令}                  \\
            \hline
            搜索模块       & pip search [module]            \\
            \hline
            安装模块       & pip install [module]           \\
            \hline
            查看已安装模块 & pip list                       \\
            \hline
            查看过期模块   & pip list --outdated            \\
            \hline
            更新模块       & pip install --upgrade [module] \\
            \hline
            卸载模块       & pip uninstall [module]         \\
            \hline
        \end{tabular}
    }
    \caption{pip命令}
\end{table}

\vspace{0.5cm}

\subsection{jieba}

jieba是一个在中文自然语言处理中较为常用的工具包之一，用于将中文文本分割成词语。\\

jieba支持三种分词模式：

\begin{enumerate}
    \item 精确模式：最精确的切分，但是速度较慢。
    \item 全模式：扫描出所有可以成词的词语，速度非常快，但是不能有效解决歧义。
    \item 搜索引擎模式：在精确模式的基础上，对长词进行再次切分，适用于搜索引擎分词。
\end{enumerate}

\vspace{0.5cm}

\mybox{词频统计}

\begin{lstlisting}[language=Python]
import jieba

FILENAME = "西游记.txt"

def read_file(FILENAME):
    with open(file=FILENAME, mode="r", encoding="UTF-8") as file:
        return file.readlines()

def word_frequency(text, n=10):
    frequency = {}

    for line in text:
        words = jieba.lcut(line)
        for word in words:
            if len(word) == 1:
                continue
            
            frequency[word] = frequency.get(word, 0) + 1

    items = list(frequency.items())
    items.sort(key=lambda x: x[1], reverse=True)
    return items[:n]

def main():
    text = read_file(FILENAME)
    frequency = word_frequency(text, 20)
    for i, item in enumerate(frequency):
        print("No.%2d: %s - %d" % (i + 1, item[0], item[1]))
    
if __name__ == "__main__":
    main()
\end{lstlisting}

\begin{tcolorbox}
    \mybox{运行结果}
    \begin{verbatim}
No. 1: 行者 - 4078
No. 2: 八戒 - 1677
No. 3: 师父 - 1604
No. 4: 三藏 - 1324
No. 5: 一个 - 1089
No. 6: 大圣 - 889
No. 7: 唐僧 - 802
No. 8: 那里 - 767
No. 9: 怎么 - 754
No.10: 菩萨 - 730
No.11: 我们 - 725
No.12: 沙僧 - 721
No.13: 不知 - 657
No.14: 和尚 - 644
No.15: 妖精 - 631
No.16: 两个 - 594
No.17: 甚么 - 551
No.18: 长老 - 512
No.19: 不是 - 507
No.20: 只见 - 485
\end{verbatim}
\end{tcolorbox}

\newpage